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Abstract 

The  trust-region  subproblem  arises  frequently  in  linear  algebra  and  optimization 
applications.  Recently,  matrix-free  methods  have  been  introduced  to  solve  large- 
scale  trust-region  subproblems.  These  methods  only  require  a  matrix-vector  product 
and  do  not  rely  on  matrix  factorizations  [4,  7],  These  approaches  recast  the  trust- 
region  subproblem  in  terms  of  a  parameterized  eigenvalue  problem  and  then  adjust 
the  parameter  to  find  the  optimal  solution  from  the  eigenvector  corresponding  to  the 
smallest  eigenvalue  of  the  parameterized  eigenvalue  problem.  This  paper  presents 
a  new  matrix-free  algorithm  for  the  large-scale  trust-region  subproblem.  The  new 
algorithm  improves  upon  the  previous  algorithms  by  introducing  a  unified  iteration 
that  naturally  includes  the  so  called  hard  case.  The  new  iteration  is  shown  to  be 
superlinearly  convergent  in  all  cases.  Computational  results  are  presented  to  illustrate 
convergence  properties  and  robustness  of  the  method. 
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1  Introduction 

An  important  problem  in  optimization  and  linear  algebra  is  the  trust-region  subproblem: 
minimize  a  quadratic  function  subject  to  an  ellipsoidal  constraint, 

min  -xT  Ax  +  gTx 

s.t.  ||C'x||  <  A  , 

where  A  €  lRnXn ,  A  =  AT,  x,g  €  fftn,  C  €  JR'lxn  is  nonsingular,  A  6  777+,  and  ||  •  ||  is  the 
Euclidean  norm.  Here  IR+  denotes  the  set  of  the  nonnegative  real  numbers.  There  are 
several  applications  for  this  basic  problem.  Two  significant  examples  are  the  regularization 
or  smoothing  of  discrete  forms  of  ill-posed  problems  and  the  trust-region  mechanism  used 
to  force  convergence  in  optimization  methods. 

A  solution  x ,  to  the  problem  must  satisfy  a  relation  of  the  form  (A  +  pCTC)x *  =  —g 
with  p  >  0.  The  parameter  p  is  the  regularization  parameter  for  ill-posed  problems  and 
the  Levenberg-Marquardt  parameter  in  optimization.  The  matrix  C  is  often  constructed 
to  impose  a  smoothness  condition  on  the  solution  x*  for  ill-posed  problems  and  to  incor¬ 
porate  scaling  of  the  variables  in  optimization.  With  a  change  of  variables,  we  can  assume 
that  (7  =  7,  the  identity  matrix,  and  this  will  be  the  case  in  the  following  discussion. 

If  positive-definite  matrices  of  the  form  A  A  pi  can  be  decomposed  into  a  Cholesky 
factorization,  then  the  method  proposed  by  More  and  Sorensen  (cf.  [2])  can  be  used  to  solve 
the  problem.  In  several  important  applications,  factoring  or  even  forming  these  matrices 
is  out  of  the  question.  This  has  motivated  the  recent  development  of  conjugate-gradient 
style  matrix-free  methods  that  only  require  matrix-vector  products.  The  recent  works  of 
Sorensen  [7]  and  Rendl  &  Wolkowicz  [4]  provide  such  algorithms.  Both  approaches  recast 
the  trust-region  subproblem  in  terms  of  a  parameterized  eigenvalue  problem  and  rely 
upon  matrix-vector  products.  Sorensen’s  algorithm  provides  a  superlinearly  convergent 
scheme  to  adjust  the  parameter  and  find  the  optima]  vector  x*  from  the  eigenvector  of  the 
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parameterized  problem,  as  long  as  the  hard  case  does  not  occur.  The  so-called  hard  case 
is  characterized  whenever  the  vector  g  is  orthogonal  to  the  eigenspace  of  A  corresponding 
to  its  smallest  eigenvalue  £1 ,  and  the  solution  p  to  the  system  (A  -  6^I)p  =  -g  is  such 
that  ||p||  <  A.  For  the  hard  case,  Sorensen’s  algorithm  is  linearly  convergent.  The 
recommended  technique  used  in  [7]  to  find  the  smallest  eigenvalue  and  corresponding 
eigenvector  of  the  parameterized  problem  is  the  Implicitly  Restarted  Lanczos  Method  (cf. 
[6]),  which  meets  the  requirements  of  limited  storage  and  reliance  only  on  matrix-vector 
products.  Rendl  &  Wolkowicz  present  a  primal-dual  semidefinite  framework  for  the  trust- 
region  subproblem,  where  a  dual  simplex-type  method  is  used  in  the  general  iteration  and 
a  primal  simplex-type  method  provides  steps  for  the  hard-case  iteration.  In  their  work  the 
calculation  of  the  smallest  eigenvalue  and  corresponding  eigenvector  of  the  parameterized 
problem,  required  at  each  iteration,  is  done  using  a  block  Lanczos  routine. 

The  purpose  of  this  work  is  to  present  a  new  matrix-free  algorithm  for  solving  the 
large-scale  trust-region  subproblem.  Our  algorithm  is  similar  to  those  proposed  in  [4,  7] 
in  the  sense  that  the  trust-region  subproblem  is  solved  through  a  parametric  eigenvalue 
problem.  However,  our  algorithm  is  able  to  cope  with  the  hard  case  naturally  in  the  same 
basic  iteration.  It  is  not  necessary  to  switch  between  two  fundamentally  different  schemes 
when  a  potential  hard  case  is  present.  This  improved  scheme  is  based  upon  computing 
the  two  smallest  eigenvalues  and  corresponding  eigenvectors  of  the  parameterized  problem 
and  the  information  concerning  the  second  smallest  eigenpair  is  incorporated  whenever 
it  is  appropriate.  This  does  not  increase  the  work  or  storage  required  in  any  substantial 
way  over  the  method  proposed  in  [7].  The  iteration  we  propose  is  based  upon  a  two-point 
interpolating  scheme  that  is  different  from  [7].  We  show  this  new  iteration  is  also  superlin- 
early  convergent.  Moreover,  our  convergence  results  include  the  hard  case  naturally,  since 
no  special  iterations  are  performed.  Such  a  unified  approach  is  not  achieved  in  either  [4] 
or  [7], 

This  work  is  organized  as  follows.  In  §2  we  analyze  the  structure  of  the  problem  and 
related  results.  There  we  give  a  complete  characterization  of  the  hard  case  with  respect 
to  the  parameterized  bordered  eigenproblems.  The  detailed  algorithm  is  presented  in  §3. 
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The  local  convergence  analysis  is  developed  in  §4.  Preliminary  numerical  experiments  are 
described  in  §5.  Finally,  in  §6  we  establish  some  conclusions  and  ideas  for  future  research. 

2  Structure  of  the  Problem 

The  trust-region  subproblem  has  a  tremendous  amount  of  structure  which  must  be  ex¬ 
ploited  to  design  an  efficient  algorithm.  The  problem  we  are  interested  in  solving  is 

min  \x7 Ax  +  qTx  ,  , 

2  (1) 

s.t.  HxH  <  A  . 

Due  to  the  structure  of  (1),  its  optimality  conditions  are  both  necessary  and  sufficient, 
as  stated  in  the  next  lemma,  where  we  follow  [7]  in  the  non-standard  but  more  convenient 
use  of  a  non-positive  multiplier. 

Lemma  1:  A  feasible  vector  x*  is  a  solution  to  (1)  if  and  only  if  x*  is  a  solutioti  to  an 
equation  of  the  form  {A  —  A*/)x*  =  —  g  with  A  —  A»7  positive  semide finite,  A*  <  0  and 
A*(A  -  ||a:*||)  =  0. 

Proof:  See  [5].  □ 

The  optimality  conditions  of  (1)  are  computationally  attractive  because  they  provide 
a  means  to  reduce  the  given  n-dimensional  constrained  optimization  problem  into  a  zero- 
finding  problem  in  a  single  scalar  variable.  One  possibility  is  to  define  the  function  99(A)  = 
||(A  -  \[)~1g\\  and  to  solve  99(A)  =  A,  monitoring  A  to  be  no  greater  than  the  smallest 
eigenvalue  of  A ,  so  that  the  Cholesky  factorization  of  A  —  XI  is  well  defined.  Applying 
Newton’s  method  to  solve  — - =  0  has  a  number  of  computationally  attractive 

¥>(  A)  A 

features  (cf.  [2])  and  this  is  the  preferred  approach  when  the  Cholesky  factorization  of 
A  -  A/  is  tractable.  When  the  cost  or  storage  requirements  of  a  Cholesky  factorization 
become  prohibitive,  a  new  approach  is  required.  The  introduction  of  another  parameter 
will  make  it  possible  to  convert  the  original  trust-region  subproblem  into  a  scalar  problem 
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that  is  suitable  for  the  large-scale  setting.  The  conversion  amounts  to  embedding  the 
given  problem  into  a  parameterized  bordered  matrix  eigenvalue  problem.  To  begin,  we 


observe  that 


/  ^  \ 

(  T  \ 

«  9  \ 

A 

(  1  XT  ) 

V  / 

yS  A  ) 

1, 1 J 

(2) 


where  i p(x)  =  ^xT  Ax  +  gTx.  As  in  [7],  we  denote  the  bordered  matrix  appearing  on  the 
right  of  (2)  as  Ba.  Thus,  using  y  =  (1  xT)T,  and  denoting  by  the  first  canonical  unit 
vector  of  lRn+1 ,  problem  (1)  can  be  rewritten  as 


l  „,T  j 

my 

/  1  A  2  AT ,,  _ 


min  ly1  Bay 


s.t.  y1  y  <  1  +  A2,  e[  y  =  1, 
which  suggests  that  the  desired  solution  might  be  found  in  terms  of  an  eigenpair  of  Ba.  If 
an  eigenvector  q  corresponding  to  an  eigenvalue  A  of  Ba  can  be  normalized  such  that  its 
first  component  is  one,  i.e.  q  =  (1  xT)T ,  then 

/ 


a  g 
\  9  A 


t\  (  j  \ 


) 


\  x  ) 


From  this  it  follows  that 


and 


Hence, 


a 


a  -  A  =  -gTx 
(A  -  A I)x  =  -g 
-  A  =  gT(A  -  Xiy'g  =  £)  ■ 


7] 


=i 


A  ’ 


(3) 

(4) 

(5) 


where  {£j}  are  the  eigenvalues  of  A  and  {7^}  are  the  expansion  coefficients  of  g  in  the 
eigenvector  basis.  The  case  where  the  eigenvector  of  Ba  has  a  zero  first  component  will 
be  analyzed  below.  We  shall  denote  the  eigenvalues  of  A  by  6j  and  such  that 


<5, 


8(  <  6r+i  <  •  •  •  <  8„ 


(6) 


so  that  61  is  the  smallest  eigenvalue  of  A,  with  multiplicity  l.  The  eigenvalues  of  Ba  shall 
be  denoted  by  (Aj(a)}  with  Ai(a)  <  A2(a)  <  ■  ■  •  <  All+i(a). 
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As  a  consequence  of  Cauchy’s  interlace  theorem  (cf.  [3],  p.  186),  the  eigenvalues 
of  the  matrix  A  interlace  the  eigenvalues  of  the  bordered  matrix  Ba.  This  can  also  be 
seen  directly  from  equation  (5).  Therefore,  the  smallest  eigenvalue  Ai(a)  of  Ba  satisfies 
Ai(a)  <  6\ .  This  assures  the  matrix  A  -  X\{a)I  is  positive  semidefinite,  regardless  of  the 
value  of  a.  Furthermore,  Ai(a)  is  well  separated  from  the  rest  of  the  spectrum  of  Ba  for 
small  values  of  A  and  it  is  expected  that  a  Lanczos-type  algorithm  will  be  successful  in 
computing  this  extreme  eigenpair. 

Equations  (3)-(4)  express  A  and  hence  x  implicitly  in  terms  of  a,  suggesting  the 
definition  of  a  convenient  function  as  follows: 

<K A)  =  9T(a  ~  9  =  ~9fx  , 

and  so, 

<t>'{ A)  =  gT(A  -  A I)~2g  =  x1  x  , 

where  differentiation  is  with  respect  to  A  and  (A  —  A I)x  -  —g.  Figure  l.a  shows  the 
typical  behavior  of  <j>.  It  is  worthwhile  noticing  that  both  4>  and  <f>'  are  readily  available 
and  contain  valuable  information  with  respect  to  problem  (1). 

Finding  the  smallest  eigenvalue  and  a  corresponding  eigenvector  of  Ba  for  a  given 
value  of  a  and  then  normalizing  the  eigenvector  to  have  its  first  component  equal  to  one 
will  provide  a  means  to  evaluate  the  rational  function  (j>  and  its  derivative  at  appropriate 
values  of  A.  If  a  can  be  adjusted  so  the  corresponding  x  satisfies  (j>'{ A)  =  xTx  =  A2  with 
a  —  A  =  (j>(\),  then 

{A  —  \I)x  =  —  g  and  A(A  —  ||x||)  =  0 

with  A  -  XI  positive  semidefinite.  If  A  <  0  then  x  is  optimal  and  solves  the  trust-region 
subproblem. 

In  case  problem  (1)  has  an  unconstrained  minimizer,  A  >  0  will  be  found  with 
||z[|  <  A  during  the  course  of  adjusting  a.  As  a  consequence,  A  is  positive  definite 

and  the  desired  interior  solution  can  be  computed  using  the  conjugate-gradient  method. 

It  remains  to  consider  the  possibility  that  the  eigenvector  of  the  bordered  matrix 
Ba  associated  to  Ai(a)  has  first  component  zero  and  thus  cannot  be  normalized  to  have 
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its  first  component  equal  to  one.  However,  this  is  equivalent  to  the  so-called  hard  case, 
analyzed  in  [2]  for  problems  where  the  Cholesky  factorization  for  A  —  \I  is  affordable 
and  discussed  in  [4,  7]  in  the  large-scale  context.  Figure  l.b  shows  </>  in  presence  of  the 
hard  case,  where  one  can  see  that  is  not  a  pole  of  cj).  Next,  we  state  the  precise  result 
describing  the  hard  case  for  problem  (1),  which  can  only  occur  if  the  vector  g  is  orthogonal 
to  the  eigenspace  7q  =  {q  |  Aq  =  <p5i}  associated  to  6i . 

Lemma  2:  Assume  g  is  orthogonal  to  and  let  p  =  —  (A  —  S\I)^g,  where  f  denotes 
the  Moore-Penrose  generalized  inverse.  If  8\  <0  and  ||p||  <  A,  them  the  solutions  to  (1) 
consist  of  the  set  {x  \  x  —  p  +  z  ,  z  G  <iq  ,  ||a:||  =  A}. 

Proof:  See  [5].  □ 

When  g  is  orthogonal  to  <Jq,  there  is  a  potential  hard-case  situation.  This  condition 
has  an  intriguing  consequence.  In  this  case  it  may  be  impossible  to  suitably  normalize 
the  eigenvector  of  interest.  For  all  values  of  a  greater  than  a  certain  critical  value  5,  the 
eigenvectors  corresponding  to  the  smallest  eigenvalue  of  Ba  will  have  a  zero  first  com¬ 
ponent.  However,  in  the  exact  hard  case,  there  is  a  well  defined  eigenvector  depending 
continuously  on  a  that  can  be  safely  normalized  for  all  values  of  a.  When  a  exceeds  the 
critical  value  5,  this  parameterized  vector  corresponds  to  the  second  smallest  eigenvalue 
of  Ba.  A  complete  understanding  of  this  case  has  led  to  the  main  algorithm  of  this  paper. 
This  understanding  is  based  upon  the  following  results. 

Lemma  3:  For  any  a  G  IR  and  q  £  <!q,  ,  (0  qT)T}  is  an  eigenpair  of  Ba  if  and  only 

if  g  is  oi'thogonal  to  S\ . 

Proof:  The  proof  is  straightforward  since  g  T  5]  and  Aq  —  q6 1  are  equivalent  to 
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independently  of  a.  □ 


The  previous  lemma  establishes  that  dim  Si  <  dim  Si  (a)  =  dim  Af(Ba  -  6\I )  and 
shows  that  {(0  qT)T  \  Aq  =  q8\}  C  Af{Ba  —  I)  where  J\f  denotes  the  null  space.  These 
sets  are  equal  for  all  but  one  exceptional  value  of  a.  The  following  result  states  that  there 
is  a  unique  value  of  a  such  that  dim  Si  (a)  =  dim  S]  +  1. 

Lemma  4:  Suppose  that  g  is  orthogonal  to  Si  and  p  =  —{A  —  8\I)1[g.  If  a  =  8\  —  gTp 
then  {Si  ,  (1  pT)T}  is  an  eigenpair  of  B~.  Moreover,  (1  pT)T  is  orthogonal  to  (0  qT)T ,  for 
every  q  6  Si . 


Proof:  Suppose  that  5  =  8\  —  g!p  with  p  =  —  (A  —  8\I)^g.  The  assumption  g  _L  Si  implies 

(4  -  M)p  =  -u  -  -  6xI)'g  =  -g 


and  thus  the  eigenvalue  relation 


/ 


\ 


«  9 

\9  A  J 
follows  immediately.  Now,  if  q  £  Si  then 

lx 


r  J 


'i' 


\v  j 


0  q1 

'  \  P 

since  {A  —  8\I)q  =  0.  □ 


=  qTp  =  -qT(A  -  6M)jg 

=  qT(A  -  6iI)*(A  -  8]I)p  =  qT(A-6,I)(A-6,I)ip  =  0, 


The  result  in  Lemma  3  was  also  stated  in  [7]  and  the  idea  behind  Lemma  4  was 
presented  in  [4].  These  results  are  the  heart  of  the  algorithm  developed  in  the  next  section 
since  they  provide  the  necessary  tools  for  handling  the  hard  case  in  the  same  iteration  de¬ 
signed  for  the  general  case.  The  next  corollary  summarizes  the  main  results  from  Lemmas 
3  and  4. 
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Corollary:  Suppose  that  g  is  orthogonal  toS\.  If  a  =  6\  +gT(A-Siiyg  then  dim  <Si(a)  = 
dim  +  1  and  for  any  other  value  of  a,  dim  S\(a)  =  dim  <Sj .  Moreover,  if  {  q\,  •  •  • ,  qt  } 
is  an  orthogonal  basis  for  Si  then 


is  an  orthogonal  basis  for  S]  (3)  and 


{ 

/  \ 

(  \ 

1 

0 

1  0  1 

< 

| 

’  f 

1 

l 

l  ) 

\  qt  )  J 

an  orthogonal  basis  for  S\(a), 

a 

T2  a. 

Numerical  difficulties  can  be  expected  when  the  vector  g  is  nearly  orthogonal  to  the 
eigenspace  S\.  If  this  happens,  there  still  exist  A*  <  Sj  and  x*  such  that  (A  -  A */)x*  = 
— g,  ||**||  =  A,  with  A*  quite  close  to  Aj .  We  call  this  situation  a  near  hard  case  and 
Figure  l.c  illustrates  it.  In  the  detail  shown  in  Figure  l.d,  one  can  see  that  the  derivative 
(f)'  changes  rapidly  for  A  close  to  so  the  problem  of  finding  A*  satisfying  the  correct 
slope  (f)\ A*)  =  A2  is  very  ill-conditioned,  i.e.  a  small  change  in  data  produces  a  large 
change  in  the  solution. 


3  The  Algorithm 

Keeping  in  mind  the  availability  of  a  well-suited  variant  of  the  Lanc.zos  method,  namely 
the  Implicitly  Restarted  Lanczos  Method  (cf.  [6]),  we  will  develop  a  rapidly  convergent 
iteration  to  adjust  a  based  on  this  process.  Our  goal  is  to  fit  a  so  that 

a  —  A  =  <^>(A)  ,  <j>'{\)  =  A2  , 

where 

<KA)  =  -gTx  ,  <£'(A)  =  xTx  , 

with  (A  -  A I)x  =  -g. 
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The  approach  of  this  work  is  similar  to  the  one  in  [7]  in  the  following  sense.  We 
compute  a  function  <j)  which  interpolates  </>  and  <//  at  two  properly  chosen  points.  Then, 
from  the  interpolating  function  (f>  we  determine  A  satisfying  (j>'{ A)  =  A2.  Finally,  we  use  A 
and  (j){ A)  to  update  the  parameter  a  and  compute  the  next  point  {A,  x }.  The  new  elements 
in  our  algorithm  are  the  introduction  of  safeguards  for  the  sequence  in  a,  the  usage  of  the 
information  relative  to  the  second  smallest  eigenvalue  of  matrix  Ba  and  the  introduction 
of  a  different  interpolating  scheme,  where  the  currently  available  information  is  exploited 
to  a  greater  extent. 


3.1  Interpolating  Scheme 


To  begin  the  iteration,  we  need  a  single-point  interpolating  scheme.  As  in  [7],  we  use 
an  interpolating  function  of  the  form 

m  =  £  - 

Let  {Ao,£o}  denote  the  point  corresponding  to  the  initial  op  so  that 
a  -  A0  =  -g1  x o  with  (A  -  XoI)xq  =  -g  . 


Requiring  4>( A0)  =  </>(A0)  =  -gTx 0  and  <£'(A0)  =  <j>'( A0)  =  x%x0  leads  to  a  straightforward 
derivation  of  expressions  for  the  coefficients  6  and  72, 

{gTx  o)2 


r  ,  9TX  0  , 

o  =  Aq - t —  and  7  = 


XqXq 


XqXq 


It  is  easy  to  see  that  6 


XqAxq 

t 

x^x0 


,  which  has  the  desirable  feature  <  6.  Computing  A 


such  that  (l>'(  A)  =  A2  we  have 


A  =  ^  + 


T 

g  x  0 


ll*o||A 

and,  after  a  little  algebraic  manipulation,  the  updating  formula  for  a  is  shown  to  be 


07  —  A  +  <^(A)  —  otQ  + 


«o  -  Aq  f  A  —  ||a,~0||  ^ 


A  + 


1 


lko|| 


(7) 


Ikoli  V  A 

This  method  is  linearly  convergent  and  may  be  slow  in  some  cases,  so  it  will  be  used  just 
to  obtain  a  second  iterate  from  an  initial  guess,  to  provide  the  starting  values  needed  in  a 
method  using  two  points. 
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The  two-point  method  is  based  on  using  the  four  pieces  of  available  information  at 
the  fc-th  iteration,  namely  ^>( ),  <j!/(Afc_i),  4>(^k)  and  <//(Afc).  We  compute  A  such  that 


1  =  1  /  A, -A  \  1  /  A  -  Afc_!  \ 

A  ^'(Afc-i)  \A k  -  \k-\  J  \A )  \^k  -  Afc_i  J 


(8) 


obtaining 

X  Afc-i||gfc-i||(||»ib||  -  A)  +  Afcl|xfc[|(A  -  ||gfc-i||)  rQv 

A(|M-|l**-ill)  ’ 

This  is  equivalent  to  defining 

k^)=-^Tx+V  (10) 

for  any  7/  and  computing  A  such  that  =  One  can  easily  verify  using  (8)  that 

V^(A)  A 

2  (a k  -  Afc-i)2iK-iir2iizA,-i[2 

7  -  (Nil- IN-,11)2 


and 


Afc||(Cfc|[  -  Afc_-|  H^fc-!  || 

11**11  -  ll**-ill 


7 


r,  where  </>(A)  is  the  value  we  are  going  to  estimate  in  order 


Ideally,  t)  =  <j>(\)  - 

o  —  A 

to  update  a.  Taking  advantage  of  the  available  values  <^(A*,_i)  and  <^>(Afc),  we  define 

y2 

7 ij  =  4>(\j)  —  - - — ,  j  —  k  —  1  and  j  =  k.  Applying  the  linear  interpolation  philosophy 

c  —  Aj 

and  defining  the  weights  by  means  of  the  already  computed  value  A,  we  choose 


V  = 


Afc  -  A 
Afc  -  A *_i 


Vk- 1  + 


A  —  A 
A  k  —  A  /j — i 


Vk 


It  is  worthwhile  noticing  that  the  pole  6  in  the  interpolating  function  (10)  satisfies 
6  >  max{Afc_i,  A*,.}.  Therefore,  6  is  not  strongly  attached  to  <?>i,  as  in  [7],  but  can  move  to 
the  right  towards  (see  (6)),  depending  on  the  occurrence  or  not  of  a  potential  exact 
or  near  hard  case. 

As  indicated  by  Lemma  3,  an  exceedingly  small  first  component  v\  of  the  eigenvector 
{v\  u^)T  of  Bak  corresponding  to  Ai(afc)  will  indicate  a  potential  hard  case.  However, 
according  to  Lemma  4,  there  will  be  an  eigenvector  corresponding  to  the  smallest  eigen¬ 
value,  A! (afc)  =  fa,  with  a  substantial  first  component  precisely  when  afc  assumes  the 
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special  value  5  =  ^  —  gTp,  with  ( A  —  6\I)p  =  —g.  We  propose  to  use  second  small¬ 
est  eigenpair  of  the  bordered  matrix  whenever  a  potential  hard  case  is  detected.  As  we 
shall  explain,  not  only  can  the  size  of  the  iterates  ||a;fc||  be  kept  under  control,  but  also 
convergence  of  { A ^ m ^ }  to  {<5i,p}  will  be  attained  by  driving  the  parameter  ak  to  the 
value  a  given  by  Lemma  4.  Moreover,  using  the  second  smallest  eigenpair  of  Bak  prevents 
numerical  difficulties  in  a  near  hard-case  situation.  Computationally,  the  first  component 
v\  of  the  unitary  eigenvector  (//]  uf)T  of  Baic  corresponding  to  the  eigenvalue  Ai(afc)  is 
declared  sufficiently  small  if  the  condition  ||g|||zzi|  <  £yj  1  -  v\  holds  for  a  given  e  £  (0, 1). 
This  is  motivated  as  follows.  Since  ( A  -  Ai(a?fc)/)ui  =  -gv  we  have 

H(A-A1(cvfc)/)u1il  |lg||h| 

IWI  y/l  -  v\ 


and  hence  ||(?|||//i|  <  e^J  1  -  u\  assures  that  ||(A  — Ai(afe)/)«i||  <  £||uj  ||,  which  can  be  made 

scale  independent  by  choosing  e  =  £]|A||.  In  other  words,  {Ai(a*,.),ui}  is  an  approximate 

eigenpair  of  A,  according  to  Lemma  3.  For  defining  the  point  { A^.-,  }  at  each  iteration  we 

compute  the  two  smallest  eigenpairs  of  Bak:  {X-l(ak),  (u\  u{)T)  and  {A2(o:/c)>  (v-2  u7)t}.  If 

|/'i  |  is  too  small,  that  is,  if  ||p|||r/i  |  <  sJ  1  —  i/f  then  Xk  =  A2(a/:)  and  xk  =  — .  Otherwise, 

V 

we  set  Xk  =  X^(ak)  and  xk  =  — . 

v\ 


Since  A*,._i  and  Xk  are  not  constrained  to  (— oo,<5i]  but  might  well  belong  to  the 

interval  the  value  A  given  by  (9)  may  be  greater  than  b\ .  Therefore,  given  6s, 

a  current  upper  bound  for  ^ixif  A  >  6$  we  simply  use  A  =  6s-  After  some  manipulation 

and  denoting  by  u  =  - — - — - - -,  =  <t>( Afc-i)  and  <f>k  =  <f>( Xk),  the  updating  formula 

Xk  —  Afc_] 

for  a  can  be  expressed  by 


ziyt+i  —  A  +  u>4>k- 1  +  ( 1  — 

,  ~  INzijlK^zi  ~  j) 

w||*fc||  +  (1  -  w)||*fc_1||  (Xk  -  X *_i) 

=  ioak-\  +  ( 1  -  a >)ak 

,  lkfc-illlkA.-lKlkA.-il  ~  lkt-i||)(Afc-i  -  X)(Xk  -  A) 

wlkfc||  +  (l  -w)||xjt-i||  (Ajfc  —  Afc_i) 


where  otk-\  =  Xk-\  +  <h-\  and  ak  =  Xk  +  (f>k- 
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3.2  Safeguarding 


Safeguarding  must  be  introduced  to  assure  global  convergence  of  the  iteration.  In  [4] 
bounds  are  presented  for  the  optimal  parameter  a*  =  A„  —  gTx *: 


^  <  «*  <  +  ||5||A  . 


(12) 


However,  computing  a  good  approximation  to  6\  can  be  as  nearly  as  expensive  as  solving 

the  given  trust-region  subproblem.  For  this  reason,  we  shall  replace  the  above  bounds 

by  some  simple  alternatives.  First,  note  that  any  Rayleigh  quotient  6s  =  -  —  gives 

v 1  v  ' 

an  upper  bound  on  6j.  If  the  diagonal  of  the  matrix  A  is  explicitly  available,  we  take 

6.s  =  min{a,,|i  =  1,. .  .,n},  otherwise  we  take  6$  =  — ~ —  with  v  randomly  chosen.  From 

v1  v 

(12)  we  see  that  a*  <  ay  =  6s  +  ||<7|| A.  Since  a  <  0  implies  Ba  is  not  positive  definite, 
we  set  a0  =  min{0,au}  to  assure  that  Ai(cv0)  <  0.  Upon  solving  for  A1(ao)  and  setting 


6j  =  Ai(a0),  we  immediately  have  a  lower  bound  aL  =  6j-  <  a*,  since  the  interlacing 

property  implies  6i  <  6\.  Using  this  simple  scheme  to  obtain  6j  and  6s  as  an  initial  lower 
and  upper  bounds  for  6},  we  can  start  with 


=  6]  - 


M 

A 


and  «(/  =  6S  +  ||flf||A  . 


(13) 


From  (13)  we  obtain  initial  bounds  for  the  sequence  in  A: 

A l  =  6,-  Hill  (^  +  a)  <  A*  <  6S  =  Xu  .  (14) 

The  upper  bound  6s  is  updated  every  iteration  using  information  from  the  smallest 
eigenpair  of  the  bordered  matrix:  6s  =  min  {  6S ,  l  where  uLAu'  =  Ai  (at)  -  m  sIhl. 

(  «|  «l  J  uj  111  '  '  u(  Ul 

As  stated  in  §3.1,  whenever  a  potential  hard  case  is  detected,  {Ai(a*..),ui}  approximates 
an  eigenpair  of  A  and  A^a^)  is  a  very  good  approximation  to  6\.  Thus  6s  becomes  a 
sharp  estimate  of  6 1 . 

It  is  worth  mentioning  that  Xk-\  or  A*,  might  not  belong  to  the  current  interval 
[Al,  Xu].  Nevertheless,  such  bounds  are  decisive  for  updating  aL  and  ay.  In  case  afc+1 
computed  by  (11)  does  not  belong  to  [ap,au],  we  use  cifc+i  =  —  ■— — —  instead.  Observe 
that  [aL,au]  C  [AL  +  <£(Al),A v  +  <^>(At/)]  and  so  afc+1  =  A^a^.+i)  +  ^(Ai(afc+1))  €  [AL  + 
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4>{\l)i  A u  +  <A(A [/)].  Therefore,  since  <f>  is  strictly  increasing  for  A  <  <§i  and  Ai(a^+i)  <  h 
holds,  we  have  Ai(afc+i)  G  [AL,  Xy],  Let  (u-i  uJ)T  be  an  eigenvector  of  Bak+1  corresponding 
to  Ai(afc+i).  If  ||wi//>i||  <  A,  we  update  Xy  =  A1(afc+1)  and  aj,  =  ay+i ■  Otherwise, 
Xy  =  Ai(aA,.+i)  and  ay  =  .  In  other  words,  the  length  of  the  interval  [ay,  ay]  is 

reduced  at  every  iteration. 


3.3  Initializing  a 

As  mentioned  in  §3.2,  there  is  a  simple  choice  for  initializing  a:  ao  —  min{0,  au}-, 
where  ay  is  given  by  (13).  This  assures  that  Aj(a:o)  <  0  but  it  has  no  additional  properties. 
In  an  attempt  to  improve  this  initial  guess,  we  have  developed  a  more  sophisticated  hot- 
start  strategy,  based  on  the  Lanczos  process  for  A.  To  begin,  we  compute 

AV  =  VT  +  fej  (15) 


where  VTV  -  Ij,  with  Ij  the  identity  matrix  of  order  j  (  j  <  n  ),  T  G  1R,JXJ  tridiagonal, 
VT  f  =  0  and  ej  denotes  the  jth  canonical  unit  vector  of  III3 . 

The  hot-start  strategy  consists  of  changing  the  variables  in  (1)  using  x  =  Vy  and 
solving  the  j- dimensional  problem 


min  \yTTy  +  gT  V  y 
s.t.  \\y\\  <  A  . 


In  other  words,  a  solution  {#*,  i/*}  satisfying  ( T  —  9*I)y *  =  —  V1  g,  9*(\\y*\\  —  A)  =  0,  T  —  9*1 
positive  semidelinite,  0*  <  0,  ||i/*||  <  A  and  0*  <  6 1  is  obtained  applying  the  algorithm 
proposed  in  [2],  based  on  Cholesky  factorization  of  the  tridiagonal  matrix  T  —  91 ,  9  <  6\. 
Then,  the  initial  value  to  be  used  is  a  =  9*  —  gTVy *.  The  hot  start  for  a  may  improve 
the  convergence  by  reducing  the  number  of  iterations. 

The  Lanczos  process  for  A,  given  by  (15),  can  be  used  to  compute  the  smallest 
eigenpair  of  Bao: 


( 

\ 


op 

y 


\ 

1 

«)- 

(> 

0  ^ 

/ 

/ 

A 

V) 

V 

ao 


VTg 


gTV 

T 


+ 


f  0  A 

i f  > 


cj+ 1  • 


(16) 
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Although  in  general  VTg  ^  fie\,  if  in  the  Lanczos  process  for  A  no  restarts  are  used,  the 
choice  V e-i  =  5/||ff||  provides  a  tridiagonal  matrix  on  the  right  side  of  (16). 

3.4  Stopping  Criteria 

Before  introducing  the  criteria  for  declaring  convergence  in  the  algorithm,  we  include 
for  completion  the  following  lemma,  which  is  a  restatement  of  a  result  given  in  [2],  was 
presented  in  [7]  and  provides  a  stopping  criterion  for  the  hard  case. 

Lemma  5:  Let  e  &  (0,1)  be  given  and  suppose  (A  —  \I)p  =  —g,  A  <  0  with  A  —  A I  positive 
semidefinite.  If  \\p  +  z\\  =  A  and  zT(A  —  A I)z  <  —e(gTp  -f  AA2)  then 

V’*  <  ip(p+  z)<\(  1  -  £)(/yP+  AA2)  <  (1  -£)V»*  ,  (17) 

where  V’*  <  0  is  the  optimal  value  of  (1). 

It  follows  directly  from  (17)  that  \x/>(p-\-z)  —  tA*|  <  and  so  p-\-z  is  nearly  optimal 

for  (1). 

Given  the  tolerances  e&,eu,£hc  €  (0, 1),  convergence  is  stated  as  follows: 

(0  ^  I  11**11  -  A|  <  saA,  then  Xk  is  a  binding  solution,  with  corresponding  multiplier 
Afc. 

(ii)  If  the  condition  ||_g|| |;/i  |  <  euyj  1  —  //2  holds  at  least  once,  we  have  2  =  1  ,  an 

approximation  to  an  eigenvector  of  A  corresponding  to  .  If  ||xfc||  <  A  then  compute 
r  such  that  ||x*  +  tz\\  =  A.  If  t2(ztAz  -  A if)  <  -£hc(gT%k  +  A&A2)  and  A^  <  6$ 
then  the  hard  case  is  declared  and  the  solution  is  given  by  {A k,xk  +  tz}. 

(Hi)  If  ||**||  <  A  and  0  <  A^  <  6s  then  the  solution  to  (1)  is  inside  the  trust  region  and 
it  is  computed  by  solving  the  symmetric  positive  definite  linear  system  Ax  =  —g, 
possibly  using  conjugate  gradients.  The  corresponding  multiplier  is  zero. 


Finally,  let  us  put  all  these  pieces  together  and  establish  our  algorithm. 
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Algorithm  1:  Let  e„  £  (0,1),  A  £  lRnxn  ,  A  =  AT ,  g  £  lRn ,  A  £  1R+  . 

■  Initialization 

•  Compute  6$  >  6i  and  initialize  oq. 

■  Compute  the  two  smallest  eigenpairs  of  Bao  : 

{Ai(a0),(/'i  uJ)T}  and  {A2(ft0),(//2  uJ)T). 

•  Initialize  safeguard  bounds  using  (13)-(14). 

•  If  ||<HIM  <  £„\/l  -  v l  then  A0  =  A2(a0)  and  x0  =  — . 

v  u2 

Else  Aq  =  Ai(ao)  and  xq  =  —  . 

•  One-point  interpolation.  Compute  Oj  using  (7)  and  safeguard  it. 

•  Compute  the  two  smallest  eigenpairs  of  Bai  : 

{Ai (<*!),(/✓!  uJ)T}  and  {A2(a,),  (/'2  «2’)r}  • 

•  k  -  1 . 

■  Repeat 

■  If  Iblll'hl  <  £vJ  1  -  v\  then  Xk  =  A 2(ak)  and  xk  =  — . 

v  1/2 

\  \  /  \  tii 

Else  Xk  =  Ai(afc)  and  xk  =  — . 

"l  ^ 

■  Two-point  interpolation.  Compute  A  using  (9).  If  A  >  6$  then  A  =  6s • 

■  Update  ak+i  =  A  +  d>(A)  using  (11)  and  safeguard  it. 

•  Compute  the  two  smallest  eigenpairs  of  Bak+l : 

{Ai(a^+i),(;2i  uJ)T}  and  {A2(«a.-+i  ),  {v*  uJ)T}  . 

•  k  < —  k  A  1 . 

•  Update  safeguards. 

•  Until  convergence 


4  Local  Convergence  Analysis 

In  this  section  it  is  analyzed  the  local  convergence  rate  of  Algorithm  1.  Before  presenting 
the  local  convergence  result,  it  is  convenient  to  introduce  some  notation.  The  value  of  a 
that  gives  the  optimal  parameter  A*  and  corresponding  solution  vector  will  be  denoted 
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by  a*.  The  notation  A*  =  A  —  A*/,  A#  =  A  —  8\I  and  Ak  =  A  —  A &/,  for  any  k,  will 
be  used.  Also,  0(1)  will  denote  a  quantity  whose  absolute  value  is  bounded  by  a  positive 
constant. 

The  next  lemma  provides  useful  tools  for  the  convergence  analysis.  Throughout,  we 
denote  by  j  the  Moore-Penrose  generalized  inverse. 

Lemma  6:  Suppose  A{X{  —  —g  and,  AjXj  =  —  g  ,  where  A;  ,  A j  <  8(.+\ .  Then 

(xi- Xj)Tg  =  (Xj  -  Xi)xfxj  (18) 

and 

xjxi  -  xj x,j  =  (A i  -  X j)p(i,j)  (19) 

where,  for  A; ,  Xj  , 

=  xfAj'xi  +  xjA~lx,j  (20) 

and,  for  X ,•  =  i§i  or  Xj  =  , 

p(hj)  =  xj A]xi  +  xj a\x,  .  (21) 

Moreover,  if  (A  —  I)p  =  —g,  then 

(xi  -  Xj)Tp  =  (A,  -  Xj)xJ A\p  (22) 

and 

(xi  -  p)Tg  -  (6-i  -  X i)pTXi  .  (23) 

Proof:  If  either  A,-  =  or  Aj  =  then  5  6  Range(A  —  6^1)  and  thus 

Xi  =  -A\g  ,  Xj  =  and  g  =  AxA\g  = 

Since  Aft  =  M_1  for  any  nonsingular  matrix  M  and  since  all  of  the  above  matrices 
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commute,  we  have  in  any  case  that 

xj Xi  -  xj Xj  =  gT(A] 2  -  A]2)g 

=  gT{A^A]Af  -  A\2AlA]2)g 
=  9T  A\2(  A  j  —  Ai)(Aj  +  Ai)A^2g 
=  (A;  -  \j)xJ(A]  +  A])x3 

Moreover,  in  either  case 

(x-i  —  x,j)Tg  =  gT{A]-A\)g 

=  gT A\(Ai  -  Aj)A]g 
=  (Xj  —  Xi)x]  Xj  . 

Furthermore,  if  (A  —  6\I)p  —  — g ,  then  in  both  cases  we  have 

(*«•  -  xj)TP  =  9T(A]  -  A\)p 

=  gTA](Ai  -  Aj)A]p 
=  (A,  -  Xj)xJaIp  . 

Since  p  =  -A^g,  we  also  have 

(Xi  -  p)Tg 

what  completes  the  proof.  □ 

It  should  be  noted  that  the  formulas  obtained  in  Lemma  6  are  easily  extended  to 
any  values  of  A,-  and  Xj  where  the  pseudo-inverse  is  used  whenever  one  of  these  numbers 
happens  to  be  an  eigenvalue  of  A.  Moreover,  formulas  ( 1 8)  (23)  also  hold  with  either 
{A i,Xi}  or  {Xj,Xj}  replaced  by  {A*,  a;*}. 


=  9T(A\  -  A\)g 
=  9T a\(A{  -  A#)A^g 
=  (Ai  -  A i)xjp. 


Through  the  updating  formula  (1 1),  we  establish  the  following  result  relating  Afc+1  -  A* 
with  Xk-\  -  A*  and  Afc  -  A*.  The  assumption  that  the  trust  region  constraint  binds  at  the 
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solution  does  not  exclude  the  possibility  of  a  hard  case.  In  fact,  in  the  hard  case  Lemma  2 
implies  that  [A  —  6\I)p  =  — g ,  ||p||  <  A,  and  one  can  choose  X*  =  p  +  z,  with  z  G  <5}  such 
that  ||x*||  =  A. 

Lemma  7:  Let  {A*, a;*}  be  a  solution  to  problem  (1)  with  ||z*||  =  A.  Then,  there  is  a 
neighborhood  B  of  A*  such  that  if  Xk-i,Xk  G  B  then  the  iterate  A*.+i  produced  by  Algo¬ 
rithm  1  using  formula  (11)  satisfies 

Afc+i  -  A*  =  (Xk-i  -  A,)(A*  -  A.)0(1)  .  (24) 


Proof:  We  divide  the  analysis  in  two  cases:  A*  <  A]  and  A*  =  6] .  First,  suppose  that 
A*  <  Then  A *  is  nonsingular  and  there  exists  r0  G  (0,<5]  -  A*)  such  that  A  -  XI 
is  nonsingular  for  every  A  G  B0  =  {A  |  |A  -  A*|  <  r0}.  Let  A^-i  and  Xk  G  B0.  Then 
Xk--[,Xk  <  ^  and  from  (9)  it  follows  that 

IKH^fc-i  II  4-  Ha?fc|l)(  A  —  ||xfc||) 

p(k,  k  —  1)A 


|A-A*|  = 


ll(ll3'A--i  ||  +  IKHM*,fc)|A*  -  xk\ 

A(A  +  ||x*:||)/>(k,  k  —  1) 


(25) 


Let  M  —  max||(T  —  XI)  '^H,  m  =  min||(T  -  XI)  and  let  p and  pj;  satisfy  0  <  pl  < 

Aei?o 

xT(A  -  XI)~1x 

-  <  pu  i  for  all  X  E  Bo  and  O^rG  IRn.  Then 


xTx 


l|sfc-i||(||gfc-i||  +  Hsfcll)  M2 
A(A +11x^11)  _  m2 


Moreover,  due  to  (20), 


p(*,  k)  =  xlx/*AJX*  +  xlxfkA^Xk  <  2  M2Pu 

xk  Xk 


xjx* 


and 


p(k,k  -  1)  =  x\xk 


tkAk-\Xk  .  t 

i  3'  k— 1  71 

Xk-\Xk-\ 


xl-\AkXx  *_! 


T 

Xk  xk 


>  2  m1  pi,  . 


Therefore,  by  (26)-(28),  it  follows  from  (25)  that 


(26) 


(27) 


(28) 


|A-A,|  <  ^i|A*-Afc| 
mPpL 
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Let  r\  =  —  A* 


ro  >  0  and  r2  =  min 


r  i  m4ph 
2  M4pu,r° 


For  \k-i ,  A*;  £  B‘2  =  {A  |  |  A  —  A*|  < 


^  V\  ~ 

r2},  we  have  |A  —  Aa.|  <  —  and  thus  A  <  Si  <  6s-  The  neighborhood  B  is  given  by  8-2  in 
this  case.  Hence,  by  (9)  and  (11), 


ak+\  —  ri  +  r2  , 


where 


T\  = 


afc-i||»fc-1||(||*fc||  -  A)  +  ajt||a;fc||(A  -  Hm^-i | 


and 


A(||*fc||  -  ||) 

~  -  ikA.— i  ll)(A  -  \\xk\\) 

A(|M-||*fc_,||) 

Since  ||x*||  =  A,  by  ( 19)— (20)  we  have 

IN-illlKIKIN-ill  +  ||®fc||)  (A  -  ||xfc_i  || )( A  -  11**11) 


T-2 


T'2  = 


p(k  —  1,  k)  A 

||®Jt-i||||*Jt||(||**-i||  +  Ikfcll)  (A,  -  Afc_i)(A*-  Afc)p(fc,  *)p(k  -  1,*) 


p(k  -  1 ,k) 

(Afc_,  -A,)(Afc-A,)0(l), 


A(A +  !!**_,  IIXA  +  IM) 


where  the  0(1)  constant  in  (29)  follows  from  the  hypothesis  A*_i ,  A*  G  B. 
Now,  observe  that 

(<*fc-l  -  a*)|K-i||(||zfc||  -  A)  +  (ak  -  o„)||:rA. ||(A  -  ||:rA._,| 


&k+ 1  - 


(29) 


+  r2  .  (30) 


A(M-IK-iII) 

However,  from  equations  (3)-(4)  and  (18),  for  any  j  we  have 

ctj  —  a*  =  Aj  —  A*  —  gT(x.j  —  a;*) 

=  (Aj  -  A*)(l  +  xjx*) . 

Using  this  along  with  ( 1 9)  (20) ,  (30)  becomes 

(Afc-i  -  A*)(l  +  a:[_i^) II**-!  ||(A*  -  A *)p(k,*) 


(Afc+i  -  A*)(l  +  a^+1a;*)  -  r2  = 


A(||ar*||  -  ||*fc- 


xk\\  +  A) 


(A k  -  A*)(l  +  xJx*)\\xk\\(\k-\  -  \*)p(k  -  1,  *) 
A(||**||  -  ||**_1||)(||**_1|]  +  A) 


(Afc-i  —  A*)(A  k  —  A*)r3 
A(||**||  -  II**-! IIKII*/;— ill  +  A)(||**||  +  A)  ’ 
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where 


Ts  =  (||*fc-i||  -  ||*Jt||)(||*ifc-i||  +  ||**||  +  A)(l  +  xTkx*)p(k  -  1,*) 

+  xJ(xk-\  -  £fc)||*fc-i  ||(||*fc-i ||  +  A )p(k,*) 

+  ( p{k,*)-p[k  -  1, *))||*fc-i||(||*fc-i||  +  A)(l  +  x^x*)  . 

nc.e 

*!(**- 1  -  Xk)  =  xJ(Akl  -  Al\)g 

=  xjAk\Ak^-Ak)Ak\g 
—  {^k— 1  ^lc)x*  Ak  Xk—\ 

1(1 

p(k,*)-p(k-  1,*)  =  xjAz'xk  +  xjAk'x*  -  xJ^A-' xk_i  -  A^x* 

=  9TA^ALlAl'g  -  gTAk^A^Ak\g  + xJ(A^  -  4*2, )a* 
=  9T(Akh  ~  Ak2)x«  +  (Afc  -  A k-i)XT A? A^x* 

=  {\k  -  A k-\){xI-i(Akli  +  A^)A^xx  +  x'i’/lfc1 


it  follows  that 


(Afe+i  -  A*)(l  +  xj+1x*)  -  r2  = 


(Afc_!  -  At)(Afc- A.KAfc-Afc-QOtl) 
A(||xfc||  -  ||®fc-i||)(||*fc-i||  +  A)(||*fc||  +  A)  ' 


Therefore,  by  ( 19)— (20),  (29)  and  because  Afc_i,  Afc  £13  we  have 

Afe+1  -  A.  =  (Afc_!  -  A*)(Afc  -  A,)0(1) 


whenever  A*  < 

It  remains  to  analyze  the  possibility  of  A*  =  6i .  This  corresponds  to  the  hard  case, 
so  there  exists  p  £  IRn  such  that  (A  —  S\I)p  =  —g,  ||p||  =  Ap  <  A.  Since  g  _L  «5i, 
the  function  <p(\)  =  ||(4  —  A/)^||  is  strictly  increasing  for  A  <  6(+\.  Moreover,  if  g 
is  not  orthogonal  to  Sp+ 1  =  {q  £  lRn  \  Aq  =  f/^+1},  then  </?(A)  tends  to  infinity  as  A 
approaches  bp^  from  the  left.  So,  we  can  define  A0  and  A t  such  that  y>(Aa)  =  —A  and 


A  +  Ar 


.  In  case  g  _L  and  At  such  that  </?(A t) 


A  +  Ap  . 


bp+\ ,  we  define  At 


— .  Now,  let  r.3  =  min  —  A„,  At  —  #i,  -A-l 


is  greater  than 
- - -1,  so  that 


for  every  At— i  and  A*,  £  B%  =  {A  |  |A  —  |  <  r,3}  we  have 

Av  .  A  A  +  A„ 

~f  <  Afc_,,Afc  <  ~2  <  A 


(31) 
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and 


~  r3  <  A^._i,  Afc  <  +  r,3  <  £*4_]  . 


(32) 


Thus,  by  (19)  and  (31) 

T  _  $  =  (Afc-i  -  <51)11:^.-! IKHzfcH  -  A)  +  (Afc  -  ^)l|a;fc||(A  -  H^-ill) 

1  A(INII-ll^-ill) 


—  A  k  —  Ai  + 


U^ilKIM  +  lla:*--ill)(A  ~  HMl 

p(k,  k  —  1)A 


> 


Ap(A  -  Ap 
4Ap(&,  k  —  1) 


Now,  by  (21)  and  (32) 


(33) 


p(k,k-  1)  =  xl_iA\xk.i  +  xTkA\_xxk 

=  9ta\_mU  +  4)a\9 

<  I|g||(2j)n-Afc_1  -  Xk) 

(% 1  -  Afc_i)2(^+i  -  Afc)2 


Again  by  (32)  it  follows  that 


^HfflKAi  -  +  r3) 

(<^+i  -  6 1  -  r3)4 


$(-+ 1 


1  2 

- - —  <  - 

—  —  r3  6(+\  —  6\ 


and  Sn  -  Si  +r3<  ~(6n  -  Si)  , 


and  so 


p(k,  k  —  1)  < 


48jjffl|(4-£i) 
(*/+i  -<5i)4 


Therefore,  from  (33)  and  (34) 


A  —  A, 


>a,_,1  +  ^zM^±lzA)! 

“  192A||g||(«n-tf1) 


(34) 


Let 


<T  = 


Ap(A  -  Ap)(^+1  —  ^)4 


>  0. 


192A||g||(6n-^) 
r4  =  min{r3,cr}.  For  A^._i  and  A*,.  £  £4  = 


For  A^  >  <5}  —  ex,  we  have  A  >  So,  let 
jA||A-«,|  <  74}  it  follows  that,  at  iteration  k, 
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A  >  holds  and  the  ideal  safeguard  A  =  ^  should  be  used.  In  this  case  the  neighborhood 
B  is  given  by  84.  From  (11)  we  have 


ak+ 1  —  r4  +  r5  > 


where 


and 


Ts  = 


74  =  +  (1  -  io)ak 

||*fc-i||||*fc||(||*jk||  -  ||*fc-i||)  (Aa-i  -  <^i )( Afc  -  ^i) 


u\\xk\\  +  (1  -  U>)||za._ 


with  to  =  - — - — ! — .  Since 
A k  ~  Afc_i 


wll**ll  +  (l-w)||Xfc_l||  =  ||*Jfc_l||  + 


(Aa  —  A^  — 1  ) 


(Aa  -  Si)p(k,k  -  1) 


||*fe||  +  ||®*-1|| 

by  (19),  (21)  and  the  hypothesis  Aa-i ,  A  a  6  B  we  have 

Ikfc-illINlMM  -  1 )( Afc  -  )(Aa_i  -  b\) 

n  ||®fc_i||(||«fc||  +  ||**_i||)  +  (Aa  -  <S-i  )p(k,k-  1) 

=  (Aa-^KAa-,  -«i)0(1). 

Now, 

ak+]  -  a  =  u{ak-\  -  5)  +  (1  -  u)(ak  -  5)  +  r5  . 
However,  by  formulas  (3)-(4)  and  (23)  we  have 

c*j  -  a  =  Xj  -  A]  -  gT(r.j  -  p) 

=  (A j  -  Sx)(\  +  pTXj)  . 

Thus,  using  (22),  equation  (36)  can  be  rewritten  as 

(Aa  -  <5i)(Aa-i  -  <$i)(l  +pTx a) 


(Aa+i  -  <5i)(l  +  P  xk+i)  -  = 


(36) 


Aa  -  Aa-i 


(^1  ~  Aa-i  )( Aa  -  )( 1  +  pTx A-i ) 

Aa  -  Aa-i 


(Aa  -  6i)(Aa-i  -  Sl)pT(xk_1  -  xk) 
Aa  Aa— i 


(Aa  —  ^i)(Aa-i  —  b\)gT a\a\_^p  . 
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Finally,  from  (35)  and  since  Afc_i,Afc  €  B  we  have 

Ar-+i  —  <5i  =  (A^-i  —  ^i)(Afc  —  ^i)O(l) 
and  the  proof  is  complete.  □ 

Remark:  The  ideal  safeguard  A  =  8\  used  in  the  second  part  of  the  proof  of  Lemma  7, 
when  A  given  by  (9)  satisfies  A  >  £j,  is  in  practice  replaced  by  A  =  8s-  As  pointed  out  in 
§3.2,  it  is  precisely  when  the  hard  case  occurs  that  the  upper  bound  8$  is  a  sharp  estimate 
for  8\ .  Using  8$  in  place  of  greatly  reduces  the  cost  of  the  iteration  by  avoiding  Lanczos 
iterations  devoted  solely  to  the  approximation  of  8\.  Moreover,  using  8s  appears  to  be 
just  as  effective  as  using  <5i  with  respect  to  obtaining  rapid  convergence. 

Based  on  Lemma  7,  the  local  convergence  result  of  Algorithm  1  is  stated  as  follows. 

Theorem  1:  Assume  that  problem  (1)  has  a  solution  {A*,  a:*}  with  x *  on  the  boundary 
of  the  trust  region.  Then  there  exists  a  neighborhood  B  of  A»  such  that  the  sequence  of 
iterates  produced  by  Algorithm  l  using  the  two-point  scheme  beginning  with  A0,Ai  6  B  is 
well  defined,  remains  in  B  and  converges  superlinearly  to  A*.  Moreover,  if  \*  <  8\,  the 
sequence  {xk}  converges  superlinearly  to  x *  and,  if  A*  =  8\,  { x ^ }  converges  superlinearly 
to  a  vector  p  such  that  ( A  —  8^I)p  =  —g,  ||p||  <  A. 


Proof:  Let  =  A k  -  A*  and  A0,  Ai  in  the  neighborhood  of  A*,  with  radius  r,  that  is 
stated  in  Lemma  7.  From  (24)  there  exists  a  constant  c  >  0  such  that  Sk+\  <  C£k£k-i- 
Thus, 


£*+i  <  cFk+'~h(k£ok- 


(ce^Y^ce  0)Fk~ 


where  Fj  is  the  Fibonacci  number  of  order  j:  Fq  =  F\  —  1 ,  Fj+i  =  Fj  +  /')_  i ,  j  >  1.  Let 
r  =  min{r,  1  /c}.  If  A0,  Aj  €  B  =  {A  |  |A  -  A*|  <  f}  then  £fc+1  — ►  0  and  the  sequence  { A^.}  is 
well  defined,  remains  in  B  and  converges  to  A*.  Again  from  (24)  it  follows  that  A*,  — >  A* 
superlinearly.  If  A*  <  8\,  we  have  a;*  =  —  Af^g,  for  every  k  sufficiently  large  x k  =  —Aflg 
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holds  and  xk  -xm  =  (Xk  -  X *)A~'xk.  So,  if  A0,  A ^  e  B  then  xk 
Ikfc+i  -  x 


—  |Afc-i  —  A*|C7(1).  If  A*  =  6],  then  p 


£*  snperlinearly  because 

\\Xk_x  ||  —  r'/c-i  11  m >  men  y  —  —  and  xk  =  —A\g  for  any 

k.  Since  g  =  A^A^g  and  g  -  AkA\g,  it  follows  that  xk  -  p  =  (Xk  -  8\)A^xk.  Thus,  if 

ll**+l  -  Pll 


A0,  Ai  6  B  then  xk 
the  proof.  □ 


p  superlinearly  since 


II xk  ~  P\\ 


|Afc_i  —  <!>i|0(l).  This  completes 


The  next  lemma  provides  a  relationship  between  the  function  <f>  and  the  interpolating 
function  (10),  to  be  used  in  the  analysis  of  the  near  hard  case  below. 


Lemma  8:  At  any  iteration  k,  the  intermediate  point  given  by  (9)  and  the  interpolating 
function  (10)  satisfy 


xTkxk- 1  + 


||gfe-i||IMp(M  -  1)(A  -  A*_i) 


^(A)-^(Aa-)  =  (A-Afc) 

(A k  ~  A )p{k,k-  1)  +  Ikfc-i lldkfcll  +  ||**-n 
whith  p(k,k  -  1)  as  defined  in  Lemma  6. 


(37) 


Proof:  By  (10)  and  (19)  we  have 

<^(A)  =  \k-i)-u)~ — — - h  (1  -  u)(/)( Afc)  -  (1  -  u>)  7 

0  —  A  o  —  o  —  Xfo 


=  </>(Xk)  +  ugT(xk  -  xk_i)  + 


^72(A  -  Afc,!) 

(6  -  X)(6  -  Afc_i) 


(1  -  o>)72(A  -  Afc) 
(6  -  A)(<5  -  Xk) 


-  ^(Afc)  +  (A  -  Xk)xkxk _i  + 


72(A-  A,)(A-  Afc_!) 

(6  -  X)(S  -  Xk)(6  -  A*,.,) 


where  u>  - 


=  4>Ak)  +  (A  -  XAxJxk  !  +  _ ll-rA:-i  l|)(  A  -  AfcKj  -  Afc_! 

*  Hl*fc||  +  (1  -  ")||*fc-i||)(Afc  -  Ajfc_a) 

Thus,  (37)  follows  from  (19)  and  the  proof  is  complete.  □ 


Xk  —  X 
A  k  —  Xk_i 


A  few  comments  are  in  order  concerning  the  near  hard  case.  As  mentioned  in  the 
last  paragraph  of  §2,  in  a  near  hard  case,  finding  A*  <  ^  case  is  a  very  ill-conditioned 
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process.  The  difference  0i  —  A*  can  be  very  small,  to  the  extent  of  being  undetectable 
within  the  given  tolerances.  The  smaller  the  value  8\  —  A*,  the  harder  it  is  to  determine 
{A*,  £*}.  Furthermore,  rounding  errors  generally  will  convert  an  exact  into  a  near  hard 
case.  Although  h\  is  still  a  pole  of  0  when  g  is  not  exactly  orthogonal  to  the  weight 
of  such  pole  is  very  small  when  compared  to  the  other  poles.  That  is,  if  {7 j}  are  the 
coefficients  of  g  in  the  basis  of  eigenvectors  of  A  and  (6)  holds,  then  7X,  •  •  -  ,je  are  practi¬ 
cally  zero.  The  strategy  of  Algorithm  1  for  dealing  with  this  case  consists  of  building  an 
interpolating  function  that  ignores  the  pole  at  early  stages.  Information  concerning  the 
second  smallest  eigenpair  of  B„k  is  combined  with  the  movable  pole  8  >  max{AA;_i,  A^}. 
The  use  of  A ^  >  0j  happens  because  the  first  component  u\  of  the  eigenvector  (rq  uJ)T 
associated  to  Ai(afc)  is  too  small,  so  that  ||«i//z1||  =  | ) a; ^ 1 1  is  excessively  large.  There¬ 
fore  {Afc,Xfc}  =  {Ai(afc),  u\jv\ }  is  redefined  as  {Aifc**.-),  u-ijv-i).  Intuitively,  this  is  a  good 
strategy  since  in  the  exact  hard  case  this  would  continuously  select  the  correct  eigenvector 
that  will  approach  (1  when  a  tends  to  a  from  either  side.  Now,  at  iteration  k  the 
parameter  07.  is  updated  as  o/j+i  =  A  +  0(A)  with  A  <  8s,  where  either  A  <  8s,  0'(A)  =  A2 
or  A  =  6s,  4>'{8s)  <  A2.  Since  A*  <  8\  <  8s,  the  matrix  A*  is  nonsingular  and,  by  the 
same  arguments  of  the  first  part  of  the  proof  of  Lemma  7,  there  exists  a  neighborhood  B 
of  A*  such  that  A^._i,  A^  G  B  and  A  —  =  (A*  —  Ak)0(  1).  In  other  words,  the  safeguarding 

A  =  6s  is  eventually  no  longer  necessary.  By  Lemma  8,  the  neighborhood  B  and  (37) 
imply  that  0(A)  —  0(A^)  =  (A  —  A^)C9(  1 )  =  (A*  —  A^.)(9(l).  The  agreement  between  A  and 
Afc  and  between  0(A)  and  0(A^.)  drive  07.  towards  «*  =  A*  +  0(A*).  As  07.  approaches  a*, 
the  reduction  of  the  safeguarding  interval  \cxL,au}  at  every  iteration  provides  a  means  to 
avoid  the  numerical  difficulties  associated  with  a  near  hard  case  and  there  is  no  need  of 
using  information  of  the  second  eigenpair  of  B„k.  At  early  stages,  however,  it  might  be 
that  A  —  8s ■  Although  0(0 1)  is  infinite,  the  interpolating  function  value  0(05)  is  finite. 
Using  Ofc+i  =  0,y  +  0(05)  is  essential  in  keeping  the  process  under  control. 

The  regularization  of  least  squares  problems  arising  from  the  discretization  of  ill- 
posed  continuous  problems  provides  an  important  class  of  trust-region  subproblems.  It  is 
worth  mentioning  that,  for  these  problems,  Algorithm  1  converges  either  to  the  constrained 
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solution,  if  the  trust  region  radius  is  small  enough,  or  to  the  mininum  norm  interior 
solution.  In  fact,  these  problems  are  of  the  form  min{||C'a;+6||  |  ||x||  <  A}  and  for  ill-posed 
problems  the  matrix  A  —  CTC  will  be  singtilar  or  nearly  singular,  and  the  vector  g  =  CTb 
will  be  orthogonal  or  nearly  orthogonal  to  S\.  Therefore,  in  the  process  of  adjusting  the 
parameter  a,  either  Xk  converges  to  A*  <  0  such  that  ( CTC  -  \*I)x*  =  -CTb,  with 
11^*11  =  A  and  CTC  —  A*/  positive  definite  or  \k  gets  arbitrarily  close  to  zero.  In  this  last 
case,  as  ak  is  driven  to  5  =  grA^g,  the  vector  xk  tends  to  p  =  -A^g,  with  ||/;||  <  A,  that 
is,  p  is  the  minimum-norm  interior  solution. 


5  Numerical  Experiments 


In  this  section  we  present  some  numerical  experiments  to  demonstrate  the  viability  of 
our  approach.  We  coded  Algorithm  1  in  MATLAB  (version  4.1)  and  through  an  interface 
between  MATLAB  and  FORTRAN  we  used  the  Implicitly  Restarted  Lanczos  Method  (IRLM) 
implemented  in  the  package  ARPACK  [6,  1].  All  the  six  sets  of  tests  were  run  in  a  SUN 
SPARC  station  IPX.  1  he  floating  point  arithmetic  is  IEEE  standard  double  precision  with 
machine  precision  2“52  «  2.2204-  10~16.  The  first  four  set  of  tests  are  quite  similar  to  the 
experiments  presented  in  [7].  To  put  the  performance  of  our  algorithm  in  a  context,  we 
include  the  number  of  matrix-vector  products  required  by  conjugate  gradients  to  solve  the 
systems  ( A  —  A I)x  =  —g,  for  known  A.  In  the  first  and  second  sets  we  study  the  sensitivity 
of  the  algorithm,  respectively,  to  different  tolerances  and  several  sizes  of  the  trust  region 
radius,  for  problems  without  the  hard  case.  The  third  set  illustrates  the  local  superlinear 
rate  of  convergence.  In  the  fourth  set  of  tests  we  analyze  the  behavior  of  Algorithm  1 
for  problems  where  the  hard  case  occurs.  In  the  fifth  set  we  compare  the  usage  of  h\ , 
the  smallest  eigenvalue  of  A,  versus  6$,  an  upper  bound  to  ,  as  well  as  performance 
for  several  competing  choices  for  o0-  Finally,  in  the  sixth  set  we  provide  a  comparison 
between  our  algorithm  and  the  approach  proposed  by  Rend!  &  Wolkowic.z  [4], 
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5.1  Different  Tolerances 

In  the  first  experiment  the  matrix  A  on  problem  (1)  is  A  =  L  —  5/,  where  L  is  the  standard 
2-D  discrete  Laplacian  on  the  unit  square  based  upon  a  5-point  stencil  with  equally-spaced 
mesh  points.  The  shift  of  —5/  was  introduced  to  make  A  indefinite.  The  order  of  A  was 
n  —  1024  and  the  trust— region  radius  was  fixed  at  A  =  100.  We  solved  a  sequence  of 
twenty  related  problems,  differing  only  by  the  vector  g,  randomly  generated  with  entries 
uniformly  distributed  on  (0,  1).  Each  of  these  problems  was  solved  three  times,  with  the 
tolerances  £a  =  10  4,  10  6  and  10~8.  As  in  [7],  we  relaxed  the  accuracy  requirement  of 
the  eigenvalue  solution  computed  by  the  IRLM.  The  number  of  Lanczos  basis  vectors  was 
limited  to  nine  and  six  shifts  (i.e.  six  matrix-vector  products)  were  applied  on  each  im¬ 
plicit  restart.  The  stopping  criterion  to  be  accomplished  by  ARPACK  is  the  following:  by  the 
IRLM  we  have  BaV  =  VT  +  fej ,  where  VTV  —  Ij,T  6  1RJXJ  is  tridiagonal  and  VT f  —  0. 
^  id'i  'y}  is  ^  eigenpair  of  T  then  {/r,  x)  is  an  approximate  eigenpair  of  Ba ,  where  x  —  Vy 
and  the  error  in  the  approximation  is  given  by  \\Bax  -  xfi\\  =  \\f\\\ejy\.  Thus,  for  a 
fixed  tolerance  eR  e  (0, 1),  the  stopping  condition  in  ARPACK  is  (||/|||ej?/|)2  <  £R\/j,\G(j), 
which  has  to  hold  for  the  j  smallest  Ritz  pairs  {/x,y}  and  where  G(j)  is  the  usual  gap 
separation  (cf.  [3]  pp.206,  222).  For  this  set  of  experiments,  j  =  9  —  6  =  3.  We 
used  initially  eR  =  0.5  and  subsequently  eR  =  max{min{£H,  -  },£mnx},  where 

£max  —  0.125,0.100,0.075  for  £a  =  10~4, 10~6, 10-8,  respectively.  The  IRLM  was  al¬ 
ways  started  with  v  =  ( 1 ,  •  •  • ,  1  /V n  +  1  and  subsequently  the  previously  calculated 
eigenvector  corresponding  to  the  smallest  eigenvalue  was  used  as  the  initial  vector.  This 
choice  not  only  standardized  the  starting  vector  across  the  battery  of  tests,  it  also  per¬ 
formed  slightly  better  than  using  a  randomly  generated  normalized  vector.  We  did  not 
use  (0  g  )  to  start  the  IRLM  because  in  a  potential  hard— case  occurrence  this  might 
prevent  the  algorithm  from  finding  the  eigenspac.e  of  Ba  corresponding  to  Sj.  The  initial 
value  of  a  was  min{0,a[/}  and  the  initial  upper  bound  for  Si  was  chosen  as  the  minimum 
of  the  diagonal  of  A. 

In  Table  1  we  report  the  average  number  of  trust-region  iterations  (IT),  the  average 
number  of  matrix-vector  products  required  by  the  trust-region  algorithm  (TRMV)  and  the 
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average  number  of  matrix-vector  products  required  to  solve  the  system  ( A  —  A Xl)x  —  —g 
using  the  conjugate-gradient  method  (CGMV),  where  A*  is  the  optima]  value  obtained  by 
the  trust-region  algorithm. 


£A 

IT 

TRMV 

CGMV 

TRMV 

CGMV 

10-4 

5.00 

56.70 

43.35 

1.31 

10~6 

7.30 

94.20 

57.00 

1.65 

10-8 

8.35 

124.95 

71.55 

1.75 

Table  1:  Average  behavior  for  different  tolerances. 

Here  the  behavior  of  the  results  presented  in  [7]  is  reproduced:  a  trust  region  solution 
requires  fewer  than  twice  as  many  matrix-vector  products  on  average  than  the  number 
needed  to  solve  a  single  linear  system  to  the  same  accuracy,  using  conjugate  gradients. 

5.2  Different  Sizes  for  the  Trust-Region  Radius 

In  the  second  experiment  the  matrix  A  on  problem  (1)  is  of  the  form  A  =  U DUT  with  D 
diagonal  and  U  —  I  —  2 uur ,  uT u  =  1.  The  elements  of  D  were  randomly  selected  from 
a  uniform  distribution  on  (—5,5).  Both  vectors  u  and  g  were  randomly  generated  with 
entries  uniformly  distributed  on  (—0.5, 0.5)  and  then  u  was  normalized  to  have  unit  length. 
The  matrix  A  was  of  order  n  =  1000.  The  trust  region  radius  varied  by  a  factor  of  10 
through  the  values  100,  10, ...,  0.0001  and  each  problem  was  solved  within  the  tolerance 
£a  =  10  6.  The  parameters  for  the  IRLM  were  the  following:  nine  Lanczos  basis  vectors, 
six  shifts  on  each  implicit  restart,  initial  tolerance  for  ARPACK  Sr  =  0.03  for  A  =  100, 
£r  —  0.1  for  A  =  10  and  £r  =  0.25  for  A  <  10.  Afterwards,  the  value  of  er  was  kept 
the  same  until  —  <  0.1,  when  er  =  0.015  for  A  =  100,  sR  =  0.05  for  A  =  10 

and  er  =  0.125  for  A  <  10.  The  initial  value  of  a  and  6s  were  min{0,O[/}  and  —4.5, 
respectively. 

At  each  iteration,  once  A*,  was  determined  as  the  smallest  eigenvalue  of  Bak,  we 
applied  the  conjugate-gradient  method  to  solve  the  linear  system  (A  -  A kI)x  =  - g ,  for 
the  sake  of  comparison.  Each  system  was  solved  by  conjugate  gradients  to  the  same  level 
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of  accuracy  as  the  solution  provided  from  the  eigenvalue  problem.  The  total  number  of 
matrix-vector  products  required  by  the  eigenvalue  method  (TRMV)  is  to  be  compared 
to  the  number  required  by  the  conjugate-gradient  method  (CGMV).  These  results  are 
presented  in  Table  2,  where  IT  is  the  number  of  trust-region  iterations.  We  selected 
these  results  from  a  set  of  ten  problems  generated  with  different  seeds,  most  of  them  with 
practically  the  same  behavior. 


A 

100 

10 

1 

.1 

.01 

.001 

.0001 

IT 

12 

10 

4 

4 

4 

4 

3 

TRMV 

636 

216 

48 

48 

48 

48 

33 

CGMV 

1034 

329 

47 

34 

25 

20 

13 

||S+M— A„/)*,|| 

...  y 

10~8 

10~7 

io-11 

10-16 

10-16 

10-ie 

10~15 

MB 

io-7 

10~9 

10-H 

io-7 

Table  2:  Behavior  for  different  sizes  for  the  trust-region  radius. 


The  same  comments  made  in  [7]  are  in  order:  the  conjugate-gradient  method  has  a 
much  easier  time  for  smaller  values  of  A  than  for  larger  ones. 

5.3  Superlinear  Convergence 

In  the  third  experiment  the  matrix  A  is  again  set  to  A  =  L  -  5/  with  L  the  2-D  discrete 
Laplacian  on  the  unit  square,  but  now  n  =  256.  The  vector  g  was  randomly  generated 
with  entries  uniformly  distributed  in  (—0.5, 0.5).  For  problems  without  hard  case,  the 
trust-region  radius  and  the  tolerance  were  set  respectively  at  A  =  10  and  eA  =  10"11. 
For  problems  with  hard  case,  we  used  A  =  100,  ehc  =  10-11  and  su  =  10~2.  To  generate 
the  hard  case,  we  performed  the  operation  g  <—  g  —  q(qTg)  to  orthogonalize  the  vector 
<y,  randomly  generated  as  before,  against  q,  the  eigenvector  of  A  corresponding  to  the 
smallest  eigenvalue  <i) .  The  eigenproblems  were  solved  by  the  internal  solver  of  MATLAB. 
The  initial  values  for  a  and  6s  were  the  same  as  in  §5.1. 
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k  A-INII 

1  8.4717E— 01 

2  1.1806E+01 

3  4.0576E+00 

4  2.0839E— 01 

5  5.3531E— 02 


6 

7 

8 

9 


2.0518E— 03 
2.4828E— 05 
1.2216E-08 
8.2245E-14 


(a) 


2 


3 


4 

5 

6 

7 

8 


4.3891E+01 
1.0486E+01 
1.924  IE- 02 
1.6997E— 02 
9.9393E— 03 
8.6270E— 05 
8.8998E-07 


9 


2.4546E-13 


(b) 


Table  3:  Verifying  superlinear  convergence:  (a)  easy  and  (b)  hard  case. 


In  Tables  3. a  and  3.b  we  monitor  the  progressive  decrease  in  the  magnitude  of 

as  iteration  k  proceeds.  The  optimal  pair  {A*,£*}  satisfied 
for  problem  (a)  and  =  io~u  for  problem  (b). 


A-IMI 

and 

A 

\a+(A- A, 

Tkdl 

llflll 

t*(z1Az- \k) 
g'1  xk+\kA2 

14 


=  io- 


5.4  The  Hard  Case 

In  the  fourth  experiment  the  matrix  A  is  of  the  same  form  used  in  the  second  set  of  tests: 
A  =  U DUt ,  D  diagonal  and  U  =  I  —  2uuT,  uTu  =  1 .  The  elements  of  D  =diag(ch , . . . ,  dn) 
were  randomly  generated  and  uniformly  distributed  on  (—5,5).  Then  we  sorted  and  set 
d\  =  —5  so  that  d\  —  —  d(  <  ef/+1  <  •  •  •  <  dn,  allowing  multiplicity  l  for  the  smalest 

eigenvalue  of  A.  The  order  of  matrix  A  was  n  =  1000.  Both  vectors  u  and  g  were  randomly 
generated  with  entries  selected  from  a  uniform  distribution  on  (  —  0.5, 0.5)  and  then  u  was 
normalized  to  have  unit  length.  In  this  case  we  know  the  corresponding  eigenvectors  of 
matrix  A:  q\  =  e;  —  2 uu{ ,  i  —  1 , . . . ,  n,  where  e;  is  the  z-th  unit  canonical  vector  of  IRn  and 
ut  is  the  th  component  of  vector  u.  Therefore,  we  have  complete  control  in  the  generation 
of  a  hard  case.  In  fact,  if  i  =  1  we  orthogonalized  g  against  r/j .  Otherwise  we  computed 

n—i 

an  orthonormal  basis  Z  for  the  null  space  of  Q  =  [71  •  •  -qi]T ,  set  g  =  ^  z,-,  where  z;  is  the 
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i- th  column  of  Z.  Finally,  we  added  a  noise  vector  c  of  norm  10  8  to  g  and  normalized  it: 
9  •  To  ensure  that  the  hard  case  really  occurred,  we  computed 

(38) 

where  7  =  UT  g  and  set  A  =  2Amm. 

The  problems  were  solved  to  the  level  e/ic  =  10-6,  with  £„  —  10~2.  The  param¬ 
eters  for  the  IRLM  were  chosen  as  follows:  ten  Lanczos  basis  vectors,  six  shifts  on 
each  implicit  restart,  the  initial  tolerance  for  ARPACK  was  £r  =  10~4  and  then  £r  = 
min  [}•  The  initiaJ  values  of  a  and  6S  were  respectively 

min{0,  ajj}  and  —4.5.  We  compared  the  performance  of  Algorithm  1  with  the  the  algo¬ 
rithms  proposed  in  [7],  using  the  same  parameters  specified  above  in  the  code. 

In  Table  4. a  we  summarize  the  average  results  of  a  sequence  of  ten  problems,  generated 
with  different  seeds.  We  also  generated  problems  with  near  hard  case  by  adding  a  noise 
vector  c  to  g  of  norm  10-2.  The  comparative  results  are  reported  in  Table  4.b. 


t 

IT 

TRMV 

\\g  +  (A  -  A*/)x*|| 

(Alg-1,  [7]) 

(Alg.l,  [7]) 

(Alg.l,  [7]) 

1 

(9.0,  7.1) 

(2340.6,  2232.8) 

(10~4,  10“3) 

5 

(11.0,  7.2) 

(2940.2,  2221.8) 

O 

1 

O 

1 

CA 

10 

(10.8,  7.2) 

(2830.8,  2193.6) 

O 

1 

O 

1 

W 

(a) 

i 

IT 

TRMV 

\\g  +  (A-  A*/)*»|| 

(Alg.l,  [7]) 

(Alg.l,  [7]) 

(Alg.l,  [7]) 

1 

(12.0,  15.2) 

(3243.6,  4377.2) 

O 

1 

ffe. 

O 

1 

w 

5 

(12.3,  31.8) 

(3257.4,  9550.8) 

(10-4, 10-4) 

10 

(12.4,  37.6) 

(3271.0,  11552.0) 

(10“4, 10-4) 

(b) 

Table  4:  Behavior  for  ^-dimensional  eigenspac.e  S\\  (a)  hard  case  and  (b)  near  hard  case 


Amin  =  \\(A-  d^gW  =  \\(D-d1I)'UTg\\=  £ 

\  s±Tf,  {di-d !)  ) 
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5.5  Initializing  a 


In  the  fifth  experiment  the  matrix  A  on  problem  (1)  is  again  A  =  UDIJT,  generated  as 
in  §5.4,  except  that  l  —  1  was  used.  The  order  of  A  was  n  =  1000  and  the  vectors  u 
and  g  had  their  entries  randomly  selected  from  a  uniform  distribution  on  (  —  0.5,  0.5).  The 
vector  u  was  normalized  to  have  unit  length.  The  vector  g  was  orthogonalized  against 
q\  —  C\  —  2uU] ,  a  noise  vector  of  norm  10-8  was  added  to  it  and  then  it  was  normalized  so 
that  g  also  had  unit  length.  Computing  A TOjn  as  in  (38),  we  generated  two  set  of  tests:  with 
and  without  hard  case,  respectively  using  A  =  5A„un  and  A  =  0.2Am*n.  The  problems 
were  solved  to  the  level  £a  =  10-6,  £/lc  =  10-6,  with  £„  =  10-2.  For  the  IRLM,  we  used 


nine  or  ten  Lanczos  basis  vectors,  respectively  when  A  =  0.2Am;„  or  A  =  5A In  both 
cases  six  shifts  were  applied  on  each  implicit  restart.  Initially,  the  tolerance  for  ARPACK 


was  sp  =  10  2  when  A  =  0.2A7 


•  /  A-INII 

eR  =  nun  [eR,  ^oooa  . 


^(z^Az-Xk) 

W00(gTxk+Xk&2) 


and  sr  =  10  4  when  A  =  5A,; 

by|  ’  1(r'!}- 


Subsequently, 


By  applying  the  IRLM  initially  to  matrix  A,  in  this  test  we  were  able  not  only  to 


replace  the  upper  bound  As  in  Algorithm  1  by  A  but  also  to  adopt  the  hot-start  strategy 
for  initializing  a  (see  §3.3).  By  way  of  comparison,  besides  using  the  hot  start  (HS),  we 
included  a0  =  min{0,  Of/},  «o  =  and  o0  =  As  for  both  family  of  problems  (A  =  0.2Amtn, 
A  =  5A,n,„).  Initially,  As  =  -4.5.  The  average  results  corresponding  to  ten  problems 


generated  by  different  seeds  are  reported  in  Table  5. 

A  —  0.2Amtn  A  —  oAmin 
a0  IT  TRMV  IT  TRMV 

Using  min  {0,  at/}  8.2  2122.8  12.0  3303.8 

A  A  7.0  1858.2  14.1  3697.4 

HS  7.2  1922.0  13.4  3562.6 

Using  min{0,  at/}  7.6  2122.0  10.5  2973.6 

As  As  6.6  1835.4  10.0  2790.4 

Table  5:  Average  behavior  for  different  ao- 


The  hot  start  for  a  improves  the  convergence  by  reducing  the  number  of  iterations,  es¬ 
pecially  when  compared  with  a0  =  min{0,  a//}.  However,  it  does  not  outperform  a0  =  As- 
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Moreover,  the  usage  of  d>i  in  the  algorithm  instead  of  6s  does  not  seem  worthwhile,  as  can 
be  seen  by  comparing  a o  =  min {0,  a//}  in  both  cases. 


5.6  Comparison  with  Rendl  &i  Wolkowicz 

In  the  sixth  experiment  we  solved  two  different  family  of  problems.  First  the  matrix  A  is 
A  =  L  —  57,  with  order  n  —  256.  The  trust-region  radius  was  set  to  A  =  100.  We  solved 
ten  related  problems,  differing  by  the  vector  g ,  randomly  generated  with  entries  uniformly 
distributed  on  (0,1).  As  in  §5.3,  an  orthogonalization  of  g  against  the  eigenvector  of  A 
corresponding  to  6\  generated  a  hard  case.  We  also  added  a  noise  vector  to  g,  with  norm 
10~8.  The  second  family  of  problems  has  A  =  U DUT ,  generated  exactly  as  in  §5.5,  with 
order  n  =  256.  The  tolerances  used  for  Algorithm  1  were  £^  =  10-6,  £/lc  =  10~6  and 
£„  =  10~2.  Ten  problems  of  each  type  (easy  and  hard  case)  were  generated  with  different 
seeds  for  each  family  and  solved  by  both  our  algorithm  and  Rendl  &  Wolkowicz’s  code 
(RW).  In  both  codes  the  eigenproblems  were  solved  by  the  internal  solver  of  MATLAB.  The 
average  results  are  reported  in  Table  6. 


IT 

Ibll 

a-IMI 

A 

h»l 

VO 

II 

T 

easy 

case 

Alg.  1 

5.0 

10“13 

10-7 

RW 

4.8 

10"2 

10-13 

hard 

case 

Alg.  1 

8.9 

10“7 

IQ-16 

RW 

9.1 

10"7 

10~15 

A  =  UDUt 

easy 

case 

Alg.  1 

6.4 

10-13 

10-7 

RW 

7.9 

10-4 

10-11 

hard 

case 

Alg.  1 

7.1 

10~5 

10"15 

RW 

1 1 .5 

10-7 

10-15 

Table  6:  Comparison  between  Algorithm  1  and  Rendl  &  Wolkowicz’s  code. 
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6  Conclusions 

We  have  presented  a  new  variant  of  an  algorithm  for  the  large-scale  trust-region  subprob¬ 
lem.  The  algorithm  is  based  upon  embedding  the  trust-region  problem  into  a  family  of 
parameterized  eigenvalue  problems  as  developed  in  [7].  The  main  contribution  of  this  pa¬ 
per  has  been  to  give  a  better  understanding  of  the  hard-case  condition  and  to  utilize  this 
understanding  to  develop  a  better  treatment  of  this  case.  The  result  has  been  a  unified 
algorithm  that  naturally  incorporates  both  the  standard  and  hard-case  problems. 

Superlinear  convergence  has  been  proved  for  this  new  algorithm  and  demonstrated 
computationally  for  both  the  standard  and  hard  cases.  This  represents  a  major  improve¬ 
ment  over  the  performance  of  the  method  originally  presented  in  [7].  In  that  approach, 
a  different  iteration  was  devised  for  the  hard  case  that  was  not  superlinearly  convergent. 
Moreover,  in  practice  this  seemed  to  occur  often  and  greatly  detracted  from  the  perfor¬ 
mance.  Our  computational  results  show  this  new  approach  overcomes  these  difficulties 
while  retaining  the  good  performance  of  the  original  algorithm  for  the  standard  case.  We 
also  compared  our  approach  to  the  one  of  Rendl  &  Wolkowicz  [4],  In  that  comparison  we 
used  the  EIG  function  from  MATLAB  to  supply  the  eigenvalues  so  that  both  methods  were 
getting  the  same  level  of  accuracy  in  the  eigenvalue  calculation.  Thus,  only  the  perfor¬ 
mance  of  the  basic  algorithms  were  compared  and  the  inconsistencies  introduced  by  having 
two  different  eigensolvers  and  different  stopping  criteria  were  avoided.  These  tests  indicate 
a  marginal  advantage  for  our  algorithm  in  terms  of  the  number  of  eigenvalue  problems 
that  need  to  be  solved  in  order  to  solve  the  given  trust-region  subproblem.  We  believe 
this  is  partially  due  to  the  need  for  the  Rendl  &  Wolkowicz  to  determine  the  smallest 
eigenvalue  6 1  of  A  in  order  to  begin  the  major  iteration.  Our  approach  avoids  this  extra 
calculation,  finally,  our  approach  seems  to  be  better  suited  to  obtaining  accuracy  in  the 
final  solution  to  (A  —  A I)x  =  —g. 

Future  work  in  this  area  should  include  a  study  of  this  approach  for  the  regularization 
of  ill-posed  problems  such  as  those  arising  in  seismic,  inversion  [8].  We  feel  that  a  further 
refinement  of  this  approach  is  likely  to  be  needed  for  this  class  of  problems.  In  particular, 
near  hard— case  conditions  seem  to  be  associated  with  these  problems. 
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(a) 


(c) 


(b) 


(d) 


Figure  1:  Example  of  the  typical  pattern  of  </>( A)  (solid)  and  the  straight  line  /(A)  =  a*  —  A 
(dashdotted).  The  three  smallest  eigenvalues  of  A  are  —2,  —0.5  and  2.  (a)  general  case 
with  the  slope  at  A*  also  plotted;  (b)  exact  hard  case;  (c)  near  hard  case;  (d)  detail  of  the 
box  indicated  in  (c). 


